Bayesian Error Estimation in Density Functional Theory 
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We present a practical scheme for performing error estimates for Density Functional Theory calcu- 
lations. The approach which is based on ideas from Bayesian statistics involves creating an ensemble 
of exchange-correlation functionals by comparing with an experimental database of binding energies 
for molecules and solids. Fluctuations within the ensemble can then be used to estimate errors 
relative to experiment on calculated quantities like binding energies, bond lengths, and vibrational 
frequencies. It is demonstrated that the error bars on energy differences may vary by orders of 
magnitude for different systems in good agreement with existing experience. 

PACS numbers: 71.15.Mb, 31.15.Ew, 02.50.Ng, 68.43.Bc 



Over the past few decades the use of Density Func- 
tional Theory (DFT)Q to predict structures, energet- 
ics, and other properties of atomic-scale systems has 
spread to many different fields and the number of appli- 
cations has grown enormously. Today applications may 
vary from studies of chemical reactions in heterogeneous 
catalysis through geophysical investigations of melting 
at the physical conditions of the Earth's core0, Q to 
studies of biomolecular systems like DNA 5, 6J. The gen- 
eral usefulness of the calculations lies in their unbiased 
"first principles" character and the relatively high degree 
of predictive power and reliability which has been estab- 
lished. With respect to the latter, it is however often 
difficult to assess directly to which extent a calculated 
quantity - this being a molecular bond length or some 
other property - is to be trusted. In practice the evalu- 
ation often falls back exclusively on the experience and 
acquired insight of the person performing the calculation. 

In this letter we present a scheme for systematic er- 
ror evaluation within the generalized-gradient approxi- 
mation (GGA) DFT. The scheme is based on ideas from 
Bayesian statistics in which an ensemble of models or 
model parameters are assigned probabilities by compar- 
ing to a database of experimental results. The ensemble 
generated can then subsequently be used to estimate er- 
ror bars on model predictions within the considered class 
of models. The scheme is simple to apply and amounts in 
the end to only a few additional non-self-consistent eval- 
uations of exchange-correlation functionals. In spirit, it 
is in fact close to a rather common practice within the 
field of DFT-GGA calculations: To asses the validity of 
a calculated DFT-GGA result it is common to try out 
different versions of the GGA-functional or perhaps to 
compare with a local-density approximation (LDA) re- 
sult. The scheme presented here provides a systematic 
framework for such an approach. 

The statistical approach we use is inspired by Bayesian 
statistics Q and was further developed in the context of 
modeling complex biomedical networks || and for con- 



struction of interatomic potentials 0. The main ingre- 
dients in the approach are a model M which is given 
by a set of parameters 9, and a database D. In our 
case the model will be a GGA-type exchange-correlation 
functional described through a number of parameters 9 
and the database D will consist of experimental atomiza- 
tion/cohesive energies E™ p for a collection of molecules 
and solids (details below). We now define a probability 
distribution in the space of model parameters (given the 
choice of model and the database) through 



P(6\MD) ~ exp(-C(0)/T), 



(1) 



where C denotes a standard least-squares cost function 
= \ Y,k(Ek{0) ~ E c h xp ) 2 with E k {B) being the at- 
omization/cohesive energy of system fc in the database 
calculated with the parameters 9. The "effective temper- 
ature" T determines the spread of the ensemble. In sim- 
ple fitting procedures only the best-fit parameters 9b. t., 
which are obtained when the cost function takes on its 
minimal value Cb.f. , are considered. Here, in contrast, a 
whole ensemble of parameter sets are considered leading 
to a spread of values on model predictions. Following 
Ref. 0, we take the value of the effective temperature to 
be given by the minimal (best fit) value of the cost func- 
tion Cb.f. as T — 2Cb.f./A r p , where N p is the number of 
parameters. For a harmonic cost function each parame- 
ter will then on the average contribute an additional cost 
of T/2 = Cb.i./Np so that the ensemble will probe model 
parameters where the cost function is in the range from 
the minimal value Cb.f. up to of the order a few times 
this value. This choice was demonstrated to work well 
in the case of error estimation for interatomic potentials 

a _ 

The model we shall consider is GGA-DFT 10] where 
the exchange functional is a local function of the den- 
sity n and its dimcnsionlcss gradient s = |Vn|/(2fcpri) 
(ri = fcp/(37r 2 )). Several suggestions for different math- 
ematical forms of the exchange-functional within GGA 
exist[I3 EH, El- A commonly used class of these in- 
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eluding PW-91 13], PBe|T[|, revPBEQ and RPBE0 
differ by only the choice of the so-called enhancement 
factor F x (s) in the exchange energy E x ; 

EM = J drn(r)e^ A (n(r))F x ( S ), (2) 

where e LDA (n) = — 3fcp/(47r) (for a spin polarized den- 
sity we have E x [n-[,n\\ = [E x [2n-[] + E x [2n{\)/2), The 
enhancement factors for PBE and RPBE are shown in 
Fig. ^ In the following we shall expand the enhance- 
ment factor as 

N P , v 2i-2 

i— 1 ^ ^ 

regarding the 9's as free parameters. We use three param- 
eters (N p = 3) which a train/test check for our database 
has shown to give the optimal fit without over-fitting. 
The model space could be extended in future work to 
include a fraction of exact exchange as for the B3LYP 
functional 16] . 




FIG. 1: Exchange enhancement factors as a function of di- 
mensionless electron density gradient. The gray lines show en- 
hancement factors drawn from the ensemble exp(— C(6)/T). 
The dashed, dotted and full lines show enhancement factors 
for PBE, RPBE and the best fit respectively. 

The database we use consists of the experimental at- 
omization energies of the molecules H2, LiH, CH4, NH3, 
OH, H 2 0, HF, Li 2 , LiF, Be 2 , C 2 H 2 , C 2 H 4 , HCN, CO, 
N 2 , NO, 2 , F 2 , P 2 and Cl 2 and the experimental cohe- 
sive energies (per atom) of the solids Na, Li, Si, C, SiC, 
A1P, MgO, NaCl, LiF, Cu and Pt. In the cost function 
all systems in the database appear with the same unit 
weight. 

All calculations are performed with a real-space multi- 
grid DFT code0 using the projector augmented wave 
method^] to describe the core regions. All calculated 
energy differences have been converged to an accuracy 
better than 50 meV with respect to number of grid 



points, unit-cell size (for the molecules and atoms) and 
number of k-points (for the solids) . The electron density 
is calculated self-consistently using the PBE-functional 
and the evaluation of the exchange-correlation energy for 
other enhancement factors are performed using the PBE 
density. This is a very good approximation due to the 
variational principle. Energies are calculated at the cal- 
culated equilibrium bond distances. 

Since Eq. @ is linear in the parameters 9, the total 
energy of a given system will also be linear in 9: 

3 

E{e)=E a +Y J ^E i e h (4) 

i=i 

where the coefficients Eq and AEi only have to be cal- 
culated once for each system. It is therefore very fast to 
calculate energy values for different enhancement factors 
in the ensemble. 

Minimizing the cost function with respect to the 
three parameters leads to the best-fit enhancement fac- 
tor shown in Fig. ^ corresponding to the parameters 
h{ = (1.0008,0.1926,1.8962). The function is seen to 
follow quite closely the PBE enhancement factor at low 
values of the gradient s. In particular it is nearly one 
in the homogeneous limit (s = 0) which is exclusively a 
result of the fitting. For s-values greater than 1.5 the 
best-fit enhancement factor increases more steeply than 
PBE being more similar to the RPBE factor. In Table |I] 
the resulting errors are shown for LDA, PBE, RPBE and 
for the best fit. RPBE performs better on the molecules 
and PBE is better for the solids; the best fit represents 
a compromise between the two. We would like to stress 
that the main point of this letter is not to derive an im- 
proved functional. Much experience has been acquired 
concerning how well different GGA functionals work for 
different systems 0, 113, and we do not expect to ob- 
tain a large overall improvement within this simple GGA 
framework. But as we shall see in the following the en- 
semble construction allows for realistic evaluation of the 
error bars on calculated quantities. 

The cost function appearing in the probability distri- 
bution Eq. is very nearly quadratic in the model pa- 
rameters in the relevant range of parameter space. We 
can therefore expand the exponent in the probability dis- 
tribution as C(9)/T = const. + \A9 T AA9, where A is 
a symmetric matrix. With U being the unitary matrix 
that diagonalizes A (AU — UK) , we can finally write the 
parameters of the enhancement factors in the ensemble 
as 

= 9 h± + UA- 1/2 a 

( 0.066 0.055 -0.034\ (a x \ 
= #b.f. + -0.812 0.206 0.007 \ \ a 2 , (5) 
\ 1.996 0.082 0.004 / \a 3 ) 

where the ot\, a 2 , and a 3 are stochastic variables which 
are Gaussian distributed with unit width: V(oti) ~ 



TABLE I: Errors in DFT atomization energies and cohesive 
energies (in eV) relative to experiment. All calculations are 
based on self-consistent PBE densities. Experimental num- 
bers are taken from Refs. Ill I2I I2II2II2I I2I 



exp(— aq/2). Using this formula it is very easy to gener- 
ate a properly distributed ensemble of enhancement fac- 
tors as shown in Fig. ^ 

The key suggestion of this letter is that for a given cal- 
culated observable, say a bond length of a molecule, the 
variation of the calculated value of this observable within 
the ensemble of enhancement factors provides a useful es- 
timate of how large the error of the best fit value is com- 
pared to experiment. From an ensemble of parameters, 
9\ 9 2 , . . . , 9 N , generated from Eq. (JSJ, the standard de- 
viation ctbee(O) which we shall refer to as the Bayesian 
error estimate (BEE) of the observable O can be deter- 
mined. Considering O as a function of 9 the BEE is 
evaluated as 



obee(O) 



\ 



1 N 



o 



best-fit ) 



(6) 



In the simple case where the observable is approxi- 
mately linear in the parameters 9 the BEE can be calcu- 
lated without explicitly generating an ensemble through 
ctbee(O) = (ELi (dO/dai) 2 ) 1 / 2 and Eq. Q. Further 
details can be found in our implementation of the ap- 
proach in the Atomistic Simulation Environment |27| . 

The ensemble in Eq. © is around the best-fit enhance- 
ment factor corresponding to ai = 0. However, consid- 
ering that the ensemble of enhancement factors (Fig. 
is quite wide compared with the difference between for 
example the PBE and the best-fit functional it seems 
reasonable to alternatively apply the fluctuations around 
either the PBE or RPBE functional. 

It is useful to consider the ratio (Obest-fit — 
Oexp)/<7BEE(0) of the actual error relative to the esti- 
mate. For any given observable in a particular system 



this ratio is just a single number so in order to assess 
whether our approach produces reliable error estimates 
from a statistical point of view we need to look at the dis- 
tribution of ratios for several observables and systems. 
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FIG. 2: Histograms of actual error relative to the BEE 
(Obest-flt— Oex P )/o"BEE(0) for different quantities. Top: atom- 
ization/cohesive energies. Middle: bond lengths and lattice 
constants. Bottom: vibrational frequencies and bulk moduli. 
The dashed lines show the expected normal distribution. Ex- 
perimental numbers are taken from Refs. is mm in Hi. 

In Fig. we show histograms of the relative error for 
different observables for all the (molecular and solid) sys- 
tems in the database. The upper panel shows the distri- 
bution in the case of the atomization/cohesive energies. 
As can be seen the distribution agrees quite well with a 
Gaussian distribution of unit width indicating that the 
error estimates are in fact reasonable for the binding ener- 
gies. The individual standard deviations for the cohesive 
energies are in the range from 0.09 eV for Na to 0.75 eV 
for Al and for the atomization energies the range is from 
0.07 eV for Li2 to 0.60 eV for C2H4. As an example, the 
GGA estimate of the cohesive energy of Na (experimen- 
tal value is 1.11 eV) is 1.02 ± 0.09 eV. The middle panel 
in Fig. [21 shows the relative error histogram for the bond 
lengths (for both molecules and solids) and relative er- 
rors for the molecular frequencies and solid bulk moduli 
is shown in the lower panel |30|. For both distributions, 
we see that the BEE's give reasonable estimates of the 
actual errors. 

It is well-known for experienced users of DFT calcu- 
lations that the reliability with which energy differences 
can be calculated vary dramatically depending on the 
particular system. The BEE catches this behavior as 
can be seen for example by comparing the cohesive en- 
ergy and the bcc-fcc structural energy difference for bulk 
copper (Fig. I3J). The BEE for the cohesive energy is 0.5 
eV while the error bar on the structural energy differ- 
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error bars in some cases by picking a database focused on 
a particular type of systems. Using for example only the 
subset of molecular systems in our database a different 
best-fit functional is obtained where the mean absolute 
error is only 0.15 eV compared to the 0.24 eV in Table 
[I] With this functional the fluctuations and hence the 
BEEs will be reduced by almost a factor of two. However, 
applying this functional to bulk metals would lead to 
larger errors which would be underestimated. 

Summarizing, we propose a simple way to estimate a 
large portion of the systematic error for DFT-GGA cal- 
culations, requiring only a few extra non-self-consistent 
energy calculations to calculate the error bars of any ob- 
servable which is a function of energies. 

We acknowledge support from the Carlsberg Foun- 
dation and the Danish Center for Scientific Computing 
through Grant No. HDW-1101-05. 



FIG. 3: Upper panel: Calculated ensemble for cohesive energy 
(x-axis) and bcc-fcc energy difference (y-axis) for a copper 
crystal. The BEE's are indicated by error bars. The inset uses 
rescaled axes. Lower panel: Calculated ensemble for binding 
energy (x-axis) and bridge-top energy difference for CO on 
a Cu(100) surface. Values for the experimentally preferred 
states p^. l3~H (fee and top) are indicated by vertical dotted 
lines. Units: eV. 

ence is two orders of magnitudes smaller (4 meV). The 
small structural energy difference is seen to be signifi- 
cantly greater than zero. The high reliability with which 
small structural energy differences can be calculated for 
bulk metals is confirmed by the fact that for almost all 
metals the correct equilibrium structures are predicted 
from the calculations |32j| . 

For chemisorption systems the BEE for the energy dif- 
ference between chemisorption at two difference surface 
sites may also be somewhat smaller than the error bar 
for the total chemisorption energy as illustrated in the 
case of CO on Cu(100) in Fig.|3 However as can be seen 
from the figure the error bar on the site-preference is so 
large that the preferred chemisorption site cannot be re- 
liably determined. This is in good agreement with the 
fact that for a number of CO-metal chemisorption sys- 
tems DFT-GGA calculations do in fact predict a wrong 
chemisorption site|33|. 

It should be noted that in general the error estimates 
depend on both the choice of database and the class of 
models (here the GGA's). This means that if some piece 
of physics is not present in the database or if the model is 
completely unable to describe a particular feature unre- 
alistic estimates may occur. For example all GGA's are 
unable to properly describe the long-range van der Waals 
interactions and hence the BEE's will be unrealistic for 
that type of interactions. 

We further note, that it may be possible to reduce the 
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